Goto

Collaborating Authors

 output unit





Attention-GatedBrainPropagation: Howthebrain canimplementreward-basederrorbackpropagation

Neural Information Processing Systems

The network chooses an action by selecting a unit in the output layer and uses feedback connections to assign credit to the units in successively lower layers that are responsible for this action.






eba237eccc24353ccaa4d62013556ac6-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all reviewers for their time and appreciate the thoughtful feedback. Below, we address the main comments. "In the example given by the author, the agent is allowed to run until it reaches a terminal state during We understand why this would be a concern, but it is actually not what we do. On the topic of terminal states, note that we have not explicitly defined any terminal states for the tasks from Figure 1. We will clarify this point further in the paper. "Their approach was marginally better than DQN on most Atari games [...] it would be nice to see some We hope that our clarification of the Figure 1 plots has increased your appreciation of low discount factors.


Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes Supplementary Materials

Neural Information Processing Systems

All these tasks are benchmarks that are used for testing models with long-term memory. It is similar to the previous one. Each image is flattened to one dimensional vector. The agent can only move forward along the corridor. The agent receives the reward only at the end of the episode.